Glocal alignment: finding rearrangements during alignment

نویسندگان

  • Michael Brudno
  • Sanket Malde
  • Alexander Poliakov
  • Chuong B. Do
  • Olivier Couronne
  • Inna Dubchak
  • Serafim Batzoglou
چکیده

MOTIVATION To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Index-based map-to-sequence alignment in large eukaryotic genomes

Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and mapping technologies (e.g. optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kbp–2 Mbp) and thus provide a unique source of information for disamb...

متن کامل

OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis

BACKGROUND Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus pr...

متن کامل

Handling Rearrangements in DNA Sequence Alignment

Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome assembly, gene identification, and phylogenetic analysis [1]. Alignments between DNA sequences are used to infer evolutionary or functional relationships between genes. Evolution occurs through DNA mutations, which include small-scale edits and larger-scale rearrangement events. T...

متن کامل

An Improved Algorithm for Genome Rearrangements

A remarkable pattern of evolutionary is that many species have closely related gene sequences but differ dramatically in gene order. It raises a new challenge in aligning two genome sequences that we have to consider changes at both the nucleotide level and the locus level such as gene rearrangements, duplication or loss. Finding the series of rearrangements at the same time with changes at nuc...

متن کامل

Computational Biology Lecture 18: Genome rearrangements, finding maximal matches

One possibility is to perform a global alignment of the two strings x and y with a special scoring sheme; for instance, +1 for a match, 0 for a mismatch, and 0 for a gap. Then we could identify all the maximal positively scoring chunks of the alignment. The disadvantages of this approach is that it requires O(mn) running time, might not obtain all candidate matches, and obtains matches that are...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 19 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2003